Learningtower: Comparative Analysis of PISA 2022 and Historical Data
Shabarish Sai and Guan Ru Chen
Department of Econometrics and Business Statistics
2024-10-18
Contributors
Shabarish Sai Subramanian
Guan Ru Chen
Dianne Cook
Kevin Y.X. Wang
Priya Ravindra Dingorkar
Introduction
The learningtower R package is designed to streamline the analysis of OECD’s Programme for International Student Assessment (PISA) data. This package provides access to datasets from 2000 to 2022, allowing researchers to explore trends in education, student performance, and other contextual factors. It simplifies the process of handling large, complex datasets, making it easier to conduct comparative studies across countries and years. Currently, we are updating the 2022 version of the learningtower package to ensure compatibility with the latest PISA data and functionalities
Collection of Data
PISA data is collected every three years from over 70 countries, targeting 15-year-old students. The assessment measures students’ abilities in reading, mathematics, and science through standardized tests. In addition to the tests, questionnaires are administered to students, teachers, and school principals to gather contextual data on educational environments, socio-economic status, and more. This comprehensive approach helps provide insights into factors that affect student performance across different educational systems worldwide.
The student dataset includes the following columns: year, country, school_id, student_id, mother_educ, father_educ, gender, computer, internet, math, read, science, stu_wgt, desk, room, dishwasher, television, computer_n, laptop_n, car, book, wealth, escs, and curiosity. These columns provide comprehensive details about the students’ background, academic performance, and access to resources, offering a robust dataset for analysis of educational outcomes and socio-economic factors.
Gender Gap Analysis: Maths
# Your data loading, filtering, and transformation logic as provided:load(here("data/countrycode.rda"))readRDS(here("data/student_2022.rds")) -> student_2022# Load the country names, and joinstudent_country <-left_join(student_2022, countrycode, by ="country")student <- student_country %>%filter(!is.na(gender), !is.na(math), !is.na(stu_wgt))# Compute average math scores and gender diffif (!file.exists("data/math_diff_conf_intervals.rda")) { math_diff_df <- student %>%group_by(gender, country_name) %>%summarise(avg =weighted.mean(math, stu_wgt), .groups ="drop") %>%ungroup() %>%pivot_wider(country_name, names_from = gender, values_from = avg) %>%mutate(diff = female - male, country_name =fct_reorder(country_name, diff))# Compute bootstrap samplesset.seed(2024) boot_ests_math <-map_dfr(1:100, ~{ student %>%group_by(country_name, gender) %>%sample_n(size =n(), replace =TRUE) %>%summarise(avg =weighted.mean(math, stu_wgt), .groups ="drop") %>%pivot_wider(country_name, names_from = gender, values_from = avg) %>%mutate(diff = female - male, country_name =fct_reorder(country_name, diff)) %>%mutate(boot_id = .x) })# Compute bootstrap confidence intervals math_diff_conf_intervals <- boot_ests_math %>%group_by(country_name) %>%summarise(lower =sort(diff)[5],upper =sort(diff)[95],.groups ="drop") %>%left_join(math_diff_df, by ="country_name") %>%mutate(country_name =fct_reorder(country_name, diff)) %>%mutate(score_class =factor(case_when( lower <0& upper <=0~"boys", lower <0& upper >=0~"nodiff", lower >=0& upper >0~"girls"),levels =c("boys", "nodiff", "girls")))save(math_diff_conf_intervals, file =here("data/math_diff_conf_intervals.rda"))} else {load(here("data/math_diff_conf_intervals.rda"))}math_plot <-ggplot(math_diff_conf_intervals,aes(x = diff, y =fct_reorder(country_name, diff), col = score_class)) +geom_point(size =3) +# Larger points for better visibilitygeom_errorbar(aes(xmin = lower, xmax = upper), width =0.5) +# Error bars with appropriate widthgeom_vline(xintercept =0, color ="gray", linetype ="dashed") +# Dashed line at x=0 for referencescale_colour_manual("", values =c("boys"="#3288bd","nodiff"="#969696","girls"="#f46d43")) +# Custom colors for groupslabs(y ="Country", x ="Difference (Girls - Boys)", title ="Math Gender Gap Analysis") +# Axis titlestheme_minimal() +# Clean themetheme(axis.text.y =element_text(size =8, hjust =1), # Smaller y-axis text, aligned leftaxis.text.x =element_text(size =10), # Normal size for x-axis labelsplot.title =element_text(hjust =0.5, size =14, face ="bold"), # Centered title, boldplot.margin =margin(20, 20, 20, 150) # Add margin for better readability ) +scale_x_continuous(limits =c(-70, 70), # Set x-axis limitsbreaks =seq(-60, 60, 20), # Define breakslabels =abs(seq(-60, 60, 20))) +# Show absolute values for x-axisannotate("text", x =55, y =1, label ="Girls", size =4, color ="red") +# Label for Girlsannotate("text", x =-55, y =1, label ="Boys", size =4, color ="blue") # Label for Boys# Print the plotprint(math_plot)
Explanation of Gender Gap: Math Scores
With the gender difference in average maths scores (measured as girls’ scores - boys’ scores) on the x-axis, this graphic displays the gender gap analysis in mathematics across several nations. The y-axis lists the countries, and the lines indicate confidence intervals, and each point displays the average score difference. Grey points indicate no discernible gender difference, red points emphasise nations where girls outperform boys, and blue points indicate nations where boys exceed girls. The graph illustrates the different degrees of gender inequality in maths ability, with boys outperforming girls in many nations and the opposite tendency in a small number.
Gender Gap Analysis: Reading Scores
# Your data loading, filtering, and transformation logic as provided:load(here("data/countrycode.rda"))readRDS(here("data/student_2022.rds")) -> student_2022# Load the country names, and joinstudent_country <-left_join(student_2022, countrycode, by ="country")# Subset data and drop missing values for reading scoresstudent <- student_country %>%filter(!is.na(gender), !is.na(read), !is.na(stu_wgt))# Compute average reading scores and gender diffif (!file.exists("data/read_diff_conf_intervals.rda")) { read_diff_df <- student %>%group_by(gender, country_name) %>%summarise(avg =weighted.mean(read, stu_wgt), .groups ="drop") %>%ungroup() %>%pivot_wider(country_name, names_from = gender, values_from = avg) %>%mutate(diff = female - male, country_name =fct_reorder(country_name, diff))# Compute bootstrap samples boot_ests_read <-map_dfr(1:100, ~{ student %>%group_by(country_name, gender) %>%sample_n(size =n(), replace =TRUE) %>%summarise(avg =weighted.mean(read, stu_wgt), .groups ="drop") %>%ungroup() %>%pivot_wider(country_name, names_from = gender, values_from = avg) %>%mutate(diff = female - male, country_name =fct_reorder(country_name, diff)) %>%mutate(boot_id = .x) })# Compute bootstrap confidence intervals read_diff_conf_intervals <- boot_ests_read %>%group_by(country_name) %>%summarise(lower =sort(diff)[5],upper =sort(diff)[95],.groups ="drop") %>%left_join(read_diff_df, by ="country_name") %>%mutate(country_name =fct_reorder(country_name, diff)) %>%mutate(score_class =factor(case_when( lower <0& upper <=0~"boys", lower <0& upper >=0~"nodiff", lower >=0& upper >0~"girls"),levels =c("boys", "nodiff", "girls")))save(read_diff_conf_intervals, file =here("data/read_diff_conf_intervals.rda"))} else {load(here("data/read_diff_conf_intervals.rda"))}# Plot reading scores with ggplot2read_plot <-ggplot(read_diff_conf_intervals,aes(diff, country_name, col = score_class)) +scale_colour_manual("", values =c("boys"="#3288bd","nodiff"="#969696","girls"="#f46d43")) +geom_point() +geom_errorbar(aes(xmin = lower, xmax = upper), width =0) +geom_vline(xintercept =0, color ="#969696") +labs(y ="", x ="", title ="Reading") +theme(legend.position ="none") +annotate("text", x =50, y =1, label ="Girls") +annotate("text", x =-50, y =1, label ="Boys") +scale_x_continuous(limits =c(-70, 70),breaks =seq(-60, 60, 20),labels =abs(seq(-60, 60, 20)))# Convert the ggplot object to an interactive Plotly plotread_plotly <-ggplotly(read_plot)# Display the interactive Plotly plotread_plotly
Explanation of Gender Gap: Reading Scores
An analysis of the gender gap in reading scores across several nations is shown in this graph. The gender gap in average reading scores is shown by the x-axis, which is computed as (Girls’ scores - Boys’ scores). The lines display the bootstrap confidence intervals, and the y-axis lists the nations. Each point on the y-axis reflects the average gender gap in reading performance. The red dots and lines illustrate that, in the majority of countries, girls perform significantly better than boys in reading, with scores veering towards positive values. The global pattern where girls tend to score higher on reading examinations is highlighted by the vertical zero line, which indicates no difference, and the fact that few countries display boys outperforming girls in reading.
Gender Gap Analysis : Science
# Your data loading, filtering, and transformation logic as provided:load(here("data/countrycode.rda"))readRDS(here("data/student_2022.rds")) -> student_2022# Load the country names, and joinstudent_country <-left_join(student_2022, countrycode, by ="country")# Subset data and drop missing values for science scoresstudent <- student_country %>%filter(!is.na(gender), !is.na(science), !is.na(stu_wgt))# Compute average science scores and gender diffif (!file.exists("data/sci_diff_conf_intervals.rda")) { sci_diff_df <- student %>%group_by(gender, country_name) %>%summarise(avg =weighted.mean(science, stu_wgt), .groups ="drop") %>%ungroup() %>%pivot_wider(country_name, names_from = gender, values_from = avg) %>%mutate(diff = female - male, country_name =fct_reorder(country_name, diff))# Compute bootstrap samples boot_ests_sci <-map_dfr(1:100, ~{ student %>%group_by(country_name, gender) %>%sample_n(size =n(), replace =TRUE) %>%summarise(avg =weighted.mean(science, stu_wgt), .groups ="drop") %>%ungroup() %>%pivot_wider(country_name, names_from = gender, values_from = avg) %>%mutate(diff = female - male, country_name =fct_reorder(country_name, diff)) %>%mutate(boot_id = .x) })# Compute bootstrap confidence intervals sci_diff_conf_intervals <- boot_ests_sci %>%group_by(country_name) %>%summarise(lower =sort(diff)[5], upper =sort(diff)[95], .groups ="drop") %>%left_join(sci_diff_df, by ="country_name") %>%mutate(country_name =fct_reorder(country_name, diff)) %>%mutate(score_class =factor(case_when( lower <0& upper <=0~"boys", lower <0& upper >=0~"nodiff", lower >=0& upper >0~"girls"),levels =c("boys", "nodiff", "girls")))save(sci_diff_conf_intervals, file =here("data/sci_diff_conf_intervals.rda"))} else {load(here("data/sci_diff_conf_intervals.rda"))}# Plot science scores with ggplot2sci_plot <-ggplot(sci_diff_conf_intervals,aes(diff, country_name, col = score_class)) +scale_colour_manual("", values =c("boys"="#3288bd","nodiff"="#969696","girls"="#f46d43")) +geom_point() +geom_errorbar(aes(xmin = lower, xmax = upper), width =0) +geom_vline(xintercept =0, color ="#969696") +labs(y ="", x ="", title ="Science") +theme(legend.position ="none") +annotate("text", x =50, y =1, label ="Girls") +annotate("text", x =-50, y =1, label ="Boys") +scale_x_continuous(limits =c(-70, 70),breaks =seq(-60, 60, 20),labels =abs(seq(-60, 60, 20)))# Convert the ggplot object to an interactive Plotly plotsci_plotly <-ggplotly(sci_plot)# Display the interactive Plotly plotsci_plotly
Explanation of Gender Analysis: Science Scores
This graph presents a Gender Gap Analysis in science scores across various countries, showing the difference between girls’ and boys’ average science scores. The x-axis represents the gender difference, calculated as( Girl’s scores - Boy’s Scores), while the y-axis lists the countries. The red points and lines indicate that girls outperform boys in science in several countries, while blue points and lines indicate that boys outperform girls. Grey points and lines represent countries where there is no significant gender difference. The vertical line at zero shows no difference, making it easy to see that in most countries, girls tend to perform better than boys in science, as shown by the positive values on the right side of the chart.